Skip to content

feat(storage): shard repo-axis GSI1 partition (#23)#34

Merged
pureliture merged 1 commit into
mainfrom
claude/issue-23-repo-gsi1-sharding
Jun 18, 2026
Merged

feat(storage): shard repo-axis GSI1 partition (#23)#34
pureliture merged 1 commit into
mainfrom
claude/issue-23-repo-gsi1-sharding

Conversation

@pureliture

Copy link
Copy Markdown
Contributor

What

Shard the per-repo REPO#<repo> GSI1 partition into
REPO#<repo>#SHARD#00..15 (repoAxisVersion=2, repoAxisShardCount=16) so
per-repo entity rows no longer collide on a single hot partition, while
repo-local reads stay logically identical.

Why

The single REPO#<repo> GSI1 partition is a hot-partition risk at cloud scale
(#23): every per-repo entity (FINDING, FINDING_OBSERVATION, FINDING_STATE,
STATE_EVENT, SCAN_LEDGER, GHAS_ALERT) writes to one partition key per repo.

Changes

  • repo_axis.py: RepoAxisKey + repo_axis_inputs — single source of truth for
    the per-entity sharded key formula, shared by write mappers and backfill so the
    two cannot drift. Sole owner of the REPO# / #SHARD# literals.
  • items.py: all 6 repo-axis mappers build the base item, then merge the sharded
    GSI1 projection derived from the item's own fields. STATE_EVENT gsi1sk gains
    an eventSeq tie-breaker suffix; the rule-axis GSI2 is unchanged.
  • repo_axis_reader.py: read_repo_axis scatter-gather fan-out across all shards
    with a migration-only include_legacy fallback; fails closed on a shard error,
    dedupes by (PK, SK) preferring the higher repoAxisVersion, returns canonical
    (gsi1sk, PK, SK) order.
  • store.py / branch_residual.py: read_observations_for_repo and
    residual_for_repo route through the fan-out reader (include_legacy threaded,
    default False). Direct primary-key reads (read_finding_state,
    has_scan_ledger, read_finding_state_events) are unchanged.
  • repo_axis_migration.py: in-place conditional backfill (not a row copy) with a
    per-entity inventory / backfilled / skipped / failed / remaining report and a
    documented compatibility-removal gate.
  • Tests: new tests/test_repo_axis_sharding.py (helper, fan-out reader,
    migration, regression scan); updated store and CLI residual tests. The
    regression scan bans any #SHARD# literal, raw gsi1pk=REPO#, and GSI1 reads
    on the raw REPO# partition outside the single helper module.

Implements docs/workbench/specs/issue-23-repo-gsi1-sharding/design.md (M1–M5)
plus the post-implementation multi-agent review fixes.

Test

  • uv run pytest — full suite green (616 passed)
  • uv run ruff check / uv run ruff format — clean

Checklist

  • Tests pass locally
  • Lint/format clean
  • Behavior-preserving for existing reads (primary-key paths unchanged)
  • Cloud rollout / canary / backfill execution — out of scope (design gate only)

Closes #23

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements GSI1 sharding for the REPO# partition to mitigate hot partition risks under Issue #23. It introduces a deterministic 16-shard key generator, a scatter-gather reader to merge sharded results, and an in-place backfill migration utility, while updating write mappers and repository stores to support the sharded schema. The review feedback highlights two key improvement opportunities: first, a potential NameError on Python versions prior to 3.11 in transport.py due to referencing DynamoDbCompatibleConfig inside its own class without future annotations; second, a performance issue in repo_axis_migration.py where redundant full table scans are performed to compute remaining legacy items, which can be optimized by setting remaining = failed.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


@classmethod
def from_env(cls) -> "DynamoDbCompatibleConfig":
def from_env(cls) -> DynamoDbCompatibleConfig:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Since from __future__ import annotations is not imported at the top of this file, referencing DynamoDbCompatibleConfig as a return type annotation inside its own class definition will raise a NameError at import time on Python versions prior to 3.11.

To prevent runtime import errors on older Python versions, please revert this annotation to a string literal "DynamoDbCompatibleConfig", or add from __future__ import annotations at the very top of the file.

Suggested change
def from_env(cls) -> DynamoDbCompatibleConfig:
def from_env(cls) -> "DynamoDbCompatibleConfig":

skipped += 1
else:
failed += 1
remaining = len(_scan_legacy(table, entity_type))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In DynamoDB, a FilterExpression is applied after the scan reads the data from disk but before returning it to the client. This means a full table scan always consumes the same amount of Read Capacity Units (RCUs) regardless of the filter.

Calling _scan_legacy(table, entity_type) again here to compute remaining triggers an additional full table scan for each entity type (6 extra scans in total). Since any item that was successfully backfilled or skipped is no longer legacy, the number of remaining legacy items is exactly equal to the number of failed items. You can directly assign remaining = failed to completely eliminate these 6 expensive scans.

Suggested change
remaining = len(_scan_legacy(table, entity_type))
remaining = failed

Spread per-repo rows across REPO#<repo>#SHARD#00..15 (repoAxisVersion=2,
shardCount=16) to remove the single hot REPO#<repo> GSI1 partition.

- RepoAxisKey + repo_axis_inputs: single source of truth for the per-entity
  sharded key formula; write mappers and backfill share it (no drift).
- read_repo_axis: scatter-gather fan-out with migration-only include_legacy
  fallback, fail-closed on shard error, dedupe by (PK,SK) preferring higher
  repoAxisVersion, canonical (gsi1sk,PK,SK) order.
- read_observations_for_repo / residual_for_repo route through the fan-out
  reader; direct primary-key reads unchanged.
- repo_axis_migration: in-place conditional backfill (not row copy) with a
  per-entity inventory/backfilled/skipped/failed/remaining report and a
  documented compatibility-removal gate.
- regression scan: single allowed construction site; bans any #SHARD# literal,
  raw gsi1pk=REPO#, and GSI1 reads on the raw REPO# partition.

Implements docs/workbench/specs/issue-23-repo-gsi1-sharding/design.md (M1-M5)
and the post-implementation multi-agent review fixes. Full suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@pureliture pureliture force-pushed the claude/issue-23-repo-gsi1-sharding branch from 7ec55e7 to de4facc Compare June 18, 2026 16:07
@pureliture pureliture merged commit 6648d7a into main Jun 18, 2026
9 checks passed
@pureliture pureliture deleted the claude/issue-23-repo-gsi1-sharding branch June 18, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shard REPO# GSI1 partition for hot-partition safety at cloud scale

1 participant